A Solvable Model of Secondary Structure Formation 
in Random Hetero-Polymers 

N.S. Skantzos, J. van Mourik and A.C.C. Coolen 

Department of Mathematics, King's College London 
The Strand, London WC2R 2LS, U.K. 

Abstract. We propose and solve a simple model describing secondary structure 
formation in random hetero-polymers. It describes monomers with a combination 
of one-dimensional short-range interactions (representing steric forces and hydrogen 
bonds) and infinite range interactions (representing polarity forces). We solve our 
model using a combination of mean field and random field techniques, leading to 
phase diagrams exhibiting second-order transitions between folded, partially folded 
and unfolded states, including regions where folding depends on initial conditions. 
Our theoretical results, which are in excellent agreement with numerical simulations, 
lead to an appealing physical picture of the folding process: the polarity forces drive 
the transition to a collapsed state, the steric forces introduce monomer specificity, and 
the hydrogen bonds stabilise the conformation by damping the frustration-induced 
multiplicity of states. 



PACS numbers: 61.41. +e, 75.10.Nr 
1. Introduction 

Proteins are polymeric chains of amino-acids. The successful functioning of a protein 
in a living organism depends crucially, among other factors, on its ability to fold into 
a desired three-dimensional structure (its 'native state'), and to subsequently attach 
in a very specific way to other macro-molecules. From a biological and medical point 
of view, it is therefore highly desirable to know which native state corresponds to a 
given amino-acid sequence, and (conversely, for therapeutic purposes) to know which 
amino-acid sequence would fold into a desired native state; this requires a quantitative 
understanding of the physical forces underlying the folding mechanism. A detailed 
identification of sequence-specific native states will necessarily involve sophisticated 
(molecular dynamics based) computational methods. However, due to the large number 
of degrees of freedom of proteins, the complicated nature of the various types of electro- 
chemical interactions and the so-called 'hard' geometric chain constraints of a protein, 
such computer programmes are unfortunately (as yet) extremely slow. Thus, in order to 
identify the role and degree of importance of the various folding parameters, a theoretical 
(i.e. statistical mechanical) analysis would be very welcome. 
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It is generally assumed that the presently observed population of real proteins has 
evolved from the larger class of random hetero-polymers, driven by natural selection. 
This suggests that the study of random hetero-polymers is a natural first step en route 
towards the statistical mechanical study of proteins. Furthermore, already at an early 
stage it was recognized |1|], via a theoretical study based on the random energy scheme 
0, that many aspects of protein folding (such as the appearance of 'mis-folded' phases, 
and transitions between folded and unfolded states) can be understood on the basis of 
equilibrium statistical mechanical calculations for random hetero-polymers. Even simple 
models with only two types of amino-acids interacting with the water solvent, viz. hydro- 
phobic amino-acids versus polar ones, can successfully describe the basics of protein 
folding (see e.g. the so-called HP model U). Further statistical mechanical approaches 
include replica calculations on polymer chains with Gaussian pair interactions @, U, 
variational analyses in replica spaces || 0], lattice models || || and lattice gas models 
T0|1 , to mention but a few. In most of these examples, analytical solvability relies on 



the absence of spatial structure, which allows for more or less conventional mean-field 
statistical mechanics. 

In this paper we extend the class of analytically solvable models in this field. 
We present a model for secondary structure formation in random hetero-polymers 
consisting of amino-acid monomers which are allowed to interact in three qualitatively 
different ways: (i) via so-called steric interactions, which reflect monomer-specific 
geometric constraints and electrical forces determining the local energy landscape for 
the orientation of monomer-connecting links, (ii) via hydrogen-bonding, which acts over 
larger distances along the chain, and is believed to play a role in the stabilization of helix- 
type structures, and (iii) via polarity-induced energy gradients, which tend to promote 
states in which the hydrophobic amino-acids are more or less turned towards the same 
side of the polymeric chain, in order to enable effective shielding from water molecules 
via folding of the polymer as a whole. Interactions (i) and (ii) are of a short-range 
nature, whereas (iii) is long-range. We note that secondary structure formation has also 
been studied within a mean- field approach in [|ll|], and that a combination of different 
types of monomer interactions has been proposed previously in 0. In the latter study, 
assuming statistical independence of energy levels, the random energy scheme could 
provide qualitative results; however, the validity of this approach has since then been 
questioned |L2[]. In contrast, our solution does not employ random energy considerations. 
It is based on a combination of mean-field and random transfer-matrix techniques, which 
in one-dimensional models are known to reduce the evaluation of the partition function 
to a relatively simple numerical problem. Due to the presence of additional long-range 
interactions (via polarity-induced forces) our model no longer lies in the universality 
class of one dimensional systems, and phase transitions are therefore possible (and will 
indeed occur) at finite temperatures. 

Our paper is organized as follows. We first define our model and the relevant 
macroscopic observables. Since the disordered infinite-range (polarity induced) part of 
our Hamiltonian, which drives the collapse to a folded state, is different in structure 
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Figure 1. Illustration of the physical meaning of our clock-state spin variables fa. A 
spin state <j) represents the physical location of an individual monomer, relative to the 
one-dimensional polymer chain axis (the 'backbone', drawn as a dashed line). In this 
graph the number of possible locations for any given monomer is q — 3. The black 
blobs represent locations occupied by a monomer. 



from the more familiar Mattis-like [yj terms in mean-field spin systems, we first solve 
our model for the case where only polarity energies are present. We then proceed to 
the solution of the full model, with all three interaction terms present, but now limiting 
ourselves (for simplicity) to the simplest choice of angular variables. Our phase diagrams 
exhibit second-order transitions between folded and unfolded states, whereas close to 
zero-temperature a hierarchy of 'mixed' phases appears, where new ergodic components 
are created and where folding depends on initial conditions. The latter phases are found 
to be related to entropic discontinuities. Finally, we present results from simulation 
experiments, which show excellent agreement with the theory. 

2. Model Definitions 

We consider one dimensional models of random hetero-polymers, where N clock- 
state spin variables fa <E j ( 2fc + 1 ) 7r ; k = 0, . . . ,q — l\ describe the spatial orientations 
of successive monomer residues in planes vertical to the polymer's chain axis, see figure 
[l]. The configurational state of the system as a whole is written as (f> = [fa, . . . , 4>n)- 
We define the Hamiltionian of the system to be the sum of three qualitatively different 
terms, H(<f>) = H s (<p) +H p (4>) +i? H b which are defined and interpreted as follows: 

(i) Polarity-induced energy (see figure ||): 

flp(*) = -^£66<W, (!) 

This describes exchange-energies of monomer pairs generated by their polarity type, 
believed to be the main driving forces for compactification. Proteins live in an 
aqueous environment, and amino- acids of the same polarity prefer to co-align, so 
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Figure 2. Illustration of polarity interactions. Every pair of monomers for which 
both £i = £j (the two are of the same polarity, denoted in the graph by '+' or '-') and 
(pi = <pj (the two are oriented towards the same side of the backbone) will give a 
reduction of the total energy. The rationale is that such an arrangement will make 
it easier for the polymer to fold into an energetically favourable conformation where 
hydrophobic monomers form the inner-residues (i.e. are shielded from the solvent) and 
hydrophilic monomers form the surface-residues (i.e. are exposed to the solvent). 



that folding allows the chain to arrange for hydrophobic and hydrophilic monomers 
to form the inner- and surface-residues of the molecule, respectively. Equation (JJ) 
describes this effect phenomenologically: £j indicates whether the monomer at site 
i is hydrophobic (£* = 1) or hydrophilic = —1), and we reduce the configuration 
energy for every pair of monomer residues which are both of the same type 
and which are also found in identical orientations relative to the backbone, 
(ii) hydrogen-bond energy (see figure |3]): 

#H b (0) = -E|^nw^- + 4n^=? ( 2 ) 

i { k=0 k=0 ) 

The second contribution to the energy describes the effect of hydrogen bonding: a 
monomer pair (i, j) is coupled by a hydrogen bond of strength J^ b or J^ b if and only 
if they are spatially separated by exactly q lattice sites and if the relative angles 
<pk+i — 4>k of all monomers k = i, . . . ,i + q — 1 form a local helical twist of ±^ 
(and therefore monomer i and monomer i + q have the same orientation relative to 
the backbone), such that intermediate monomers do not block the formation of the 
hydrogen bond. 
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Figure 3. Illustration of the hydrogen-bonding and steric energies. Left: hydrogen 
bonds of strength J^ b are formed between monomers i and j whenever \j — i\ = q, 
where q represents the number of available orientations (q = 3 in this graph), and at 
the same time Yik=i &4> k +i-<l>k 2^ = 1 (similarly for J^ b ). The thick-dashed line in the 
left figure is a guide to the eye, indicating the helical structure of the backbone induced 
by the hydrogen bonds. Right: steric interactions impose a specific preferred relative 
angle — {(j>i+\ — — (cf>i — dependent on the (largely geometrical) properties 

of the monomer type present at site i. 

(iii) Steric energy (see figure |3|): 

H s ((f>) = -J s cos[(0 i+ i - fa) - (fa - (f>i-i) - en] (3) 

i 

This describes local short-range steric monomer-monomer interactions, favoring 
alignment of the relative angles 4>i) and (</>— 4>i-i) towards a specific preferred 

direction ai which depends on the type of monomer present at site i. 

The various energy scales in the problem, and thus the relative importance of the 
three types of forces, are controlled by the non-negative coupling constants { J p , Jn b R , J s }- 
A preference for left-or right-handed helices can be built in by modifying the balance 
between J^ b and J§ b . The quenched disorder in the problem is given by the realisation 
of the (randomly drawn, but fixed) amino-acid sequence, i.e. the variables a;}. We 
denote the monomer type found at location i in the chain by Aj, so that ^ = £(Aj) and 
a; = a(Xi). The disorder is characterized by the distributions 

i A 

Note that for random hetero-polymers the distribution W(X) is simply the a priori 
distribution according to which the monomers were selected. The marginal distribution 
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specifying polarity statistics is written as 

w[£] = Jdaw[a,£] = ^(l+p)6e il + -(l-p)6 t - 1 , (6) 
with p G [-1, 1]. 

We will solve our model in thermal equilibrium via a suitable combination of mean- 
and random- field techniques |L4j], which will allow us to evaluate the free energy per 
monomer / in the thermodynamic limit: 




where H(<p) = H p (<p) + Hub(4>) + H B ((f>). The parameter (3 is an effective inverse 
temperature, which controls the amount of stochasticity in the underlying dynamics 
(with f3 — and (3 = oo corresponding to purely random and purely deterministic 
dynamics, respectively). The effective temperature will generally depend on various 
environmental factors, such as solvent conditions. We wish to emphasise that our present 
model takes into account the folding of the hetero-polymer only as a general mechanism 
with which to realise the potential for energy gain via polarity-induced forces, without 
specifying the detailed three-dimensional structure this reduction would give rise to. 
It can consequently describe only the formation of secondary structure as the result 
of folding, not the emerging tertiary structure; this is the price to be paid for exact 
analytical solvability. 

Given the above definitions, it is natural to divide the monomers into two groups 
according to their polarity, {1, . . . , iV} = I + \JI_ with J ± = {i\ & = ±1}. We note that 
liniAr^oo \I±\/N = |(1 ±p). Within each group one can define as natural observables to 
measure the degree of polymer compactification (i.e. the impact of the polarity-induced 
forces) the distribution of monomer residue orientations, P+(4>] (fi) and P~(4>; <p)'- 

P ± (0) = lim (P ± (0; 0)> P ± (0; 0) = -L £ 8^ (8) 

where (. . .) denotes an average over the Boltzmann distribution Poo(4>) ~ exp[— (3 H (</>)]. 
Note that, by definition, P±(<p) G [0,1] and S<a-P±(0) = 1- Note also that due to the 
equivalence of all absolute orientations, spontaneous symmetry breaking can occur. In 
order to measure the degree of L(eft) or i?(ight) chirality of the folded state, as induced 
by the steric interactions and hydrogen bonds, we introduce the two order parameters 

j m— 1 

X± = lim (x±(<l>)) X±(0) = ^E II S<f i+k+ i-4> i+k ,±% ( 9 ) 

i k=0 

Thus X+ = -df/dJk b and X - = ~df/dJ§ b . 

Before solving the full model it is instructive to consider the various limiting cases 
one obtains by setting specific combinations of the characteristic energies { J s , J p , Jm} 
in (|1]-[3|) to zero. First, in the absence of polarity interactions the model reduces to a 
one-dimensional random-field Potts model with site-disorder, for which the free energy 
is known to be analytic for finite temperatures, and there can be no phase transition. 
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On the other hand, in the absence of steric- and hydrogen-bond interactions the model 
reduces to a mean-field model with site disorder. The most interesting scenario, from a 
physical and a technical point of view, is the one where all three forces are included. Due 
to the long-range interactions our model is expected to show a phase transition, whereas 
the short-range interactions are expected to generate frustration phenomena such as 
hierarchies of discontinuous transitions jnj and non-analytic distribution functions for 
local observables such as Devil's Staircases ||14j| . An appealing feature of the model is 
that, apart from the mean field forces, it is essentially one- dimensional and thus allows 



for an exact solution based on random-field techniques such as in 16, 17, 14, 18 



3. Solution of the Polarity Model 

In order to identify and interpret the properties of the full model, to be analysed in 
a subsequent section, we will now first solve our model in the absence of short-range 
interactions, i.e. for J s = = 0, so that H(<f>) = H p (<p). 



3.1. Calculation of the Free Energy 



Upon using the simple identity J2ij $fe,<^ = J2<f> J2ij ^,0^,0 we can express the polarity 
Hamiltonian ([!]) in terms of the order parameters (§) 

f I T I I T I \2 



H P (4>) 



4> 



N 



r, <f>) 



N 



(10) 



Upon introducing delta functions to enforce the definitions (||), in integral 
representation, we obtain the following expression for the free energy per site (|7|): 



where 



G 



{P±,P±} 



.pj p ^{(i+p)p + (<p)-(i- P )P4^y 



- i £ {p + (0)p + (0) + p_(0)p_(0)} 



--(l+p) log ^ e -2iiW)/(i+P) _ 1(1 _ p ) i og £ e -2iP-W/(l-p) 

In the thermodynamic limit iV — > 00, the integral in (|TT| ) can be evaluated via 
steepest descent. Derivation of G[. . .] with respect to P±(<f>) gives the equation 

i>P±{4>) — T§(1 ± P)(3Jp [(1 +P)P+(4>) — (1 — P)P-(4>)]i witn which we eliminate the 
conjugate order parameters. This results in / = extr{£}/[{L}] 



f[{L}] 



^VL 2 (0)-±^logW 
4 ^ W 2/3 8 ^ 



2(3 



-(3J p L(<t>) 



(12) 



Structure Formation in Random Hetero- Polymers 



8 



where L(<f>) = (1 + p)P + (0) — (1 — p)P_(0). Extremisation with respect to the L((f>) 
leads to a set of q coupled saddle-point equations from which to solve {L(0)}, in terms 
of which we can then also express our original observables P±(<f>): 

e f3J p L(<f>) e -pj p L{<l>) 

L ^ = e /?w) - (1 ~ p) e^W) (13) 



±PJ P L{<t>) 



Note that (13) is invariant under the transformation {p, P(0)} — > {— p,— L(0)} V0, and 
that E^(0) = 2p. 

The uniform high temperature solution, where L((p) = L* = Ipjq for all and 
therefore P±(<f>) = ^ for all 0, always satisfies (|13D . Expansion of the free energy 
(12) around the uniform solution {L*} allows us to determine the critical temperature 
T c = l/Pc where it becomes locally unstable. For perturbations {SL} orthogonal to 
{L*}, i.e. for which £^5L(0) = 0, we find 

f[{L*+ 5L}} = f[{L*}] + i(J- -/3) $> 2 L(0) + 0(5 3 L) (15) 

^q £J P ^ 

Hence a second-order phase transition to an ordered state takes place at 

T c = = ^. (16) 
Q 

(or at a higher temperature, as a first-order transition). This value is independent of 
the variable p = liniAr^oo jrJ2idj which measures the balance between hydrophobic and 
hydrophilic monomers. 

Similarly we can find the system's ground state, for any non-trivial value of m. 
Let us define L g ((f>) = lim T ^ o L(0), L + = max 9i L g (0) and L_ = min 9i L s (0), and let 
us denote the number of for which L g (<j)) = L + as q + > 1 and the number for which 
L g {4>) = L_ as g_ > 1 (with q + + g_ < q). We assume L_ < L + ; one can easily convince 
oneself that the alternative L_ = L + , i.e. the high temperature solution, will not give 
the ground state. Taking the T — > limit in the saddle-point equations (|13"D then shows 
that L± = and that L g ((f>) = for all such that L_ < L g (<f>) < L + . Thus L g {4>) 
can take only one of three different values. The ground state energy per monomer, 
u = limj^o /, can subsequently be obtained as the T — > limit of (|T2"|): 



The minimum is obtained for g + = g_ = 1: there is one angle 0+ with L g (<f) + ) — p + 1, 
there is one angle 0_ with L 5 (0_) = p — 1, and the remaining g — 2 orientations have 
L g ((f)) = 0. The ground state, written in terms of the monomer densities P±(0), is 

P + (0 + ) = 1, P+(0) = for all 0^0+ (18) 
P_(0_) = 1, P_(0) = for all ^ 0_ (19) 
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All hydrophobic monomers cluster at some orientation <f> + , and all hydrophilic monomers 
cluster at a different orientation 0_, which is indeed the energetically most favourable 
configuration for any value of q. For q > 2 this introduces a trivial degeneracy of the 
ground state, since the choice made for ± is constrained only by 0+ ^ 0_. 

In general, non-trivial solutions of the non-linear fixed point equations (fOJ) can 
only be determined numerically, due to the presence of two terms J2d> e ±l3JpL ^\ which 
act as normalisation constants for P±{4>) and couple the q equations in a transcendental 
manner. However, for the two simplest scenarios q = 2 (i.e. G { — | , |}) and g = 3 
(i.e. G {— 0, ^}) it turns out that these terms can be transformed away, and that 
an analytical solution is available. We note that, due to the specific properties of the 
high temperature state (where all L(<ft) are identical) and of the ground state (where 
the L((p) can take only one of three possible values), the q > 3 phase diagrams can at 
most differ quantitatively from that of the q = 3 model (provided q remains finite). We 
now solve our saddle-point equations (p~3|) for q G {2,3}. 



3.2. Phase Diagram for q = 2 

In the case where q = 2 (two available orientations per monomer) we have G {— f , | }, 
and we define Z = ^(5J p [L{-^k) — L(—^n)]. Since the two order parameters L(<p) also 
obey |[L(^7r) + L(— |7r)] = p, one simply has 

L{± l -ix)=p±Z 



Insertion into (|T3|) leads to a single Curie- Weiss equation for Z: 

Z = tanh(/3 J P Z) (20) 

This predicts a second-order transition at (3J P = 1, in agreement with the critical 
temperature ([16]) for de-stabilization of the high-temperature solution found earlier. The 
order parameter Z is recognised to be simply the staggered magnetisation iV" 1 J2i we 
would have generated if we had studied the q = 2 model upon transforming <pi = -irci, 
with o"j G { — 1,1} (this would have led to a Mattis-type |13| Hamiltonian). The order 
parameters P+(4>) and P-((f>) subsequently follow in terms of the solution Z of equation 
(0) as 

Pj-n) =P(--n) 1 



2 ' ~ K 2 ' l + e~ 2 ^ z 
Pj-lir) = Pj-ir) ' 



2 ' ~ v 2 ' l + e 2 ^ z 
For T > T c = J p one simply recovers the uniform state P+{4>) = P-{4>) = f> ^ or 
all 0, as it should. Below T c the system will choose to gradually align hydrophobic 
and hydrophilic monomers and fold, with perfect alignment (or separation) of the two 
polarity types at T = 0. 
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3. 3. Phase Diagram for q = 3 

In the case where q = 3 (three possible orientations per monomer) we have G 
{ — |7r, 0, |7r}. The possible solutions of our saddle-point equation (|T3| ) can be classified 
on the basis of the number of different values taken by the three order parameters 
{L(-§7r),L(0),L(§7r)}, as follows: 

(i) All order parameters take the same value, L(— §7r) = L(0) = £(§7r) = |p. This is 
the uniform high temperature state, which we have already encountered, and which 
according to ( |16D becomes locally unstable at T c = | J p . 

(ii) Exactly two order parameters take the same value. In view of the invariance of 
equation (|13|) under permutations of the three allowed locations { — |7r, 0, |7r} we 
may without loss of generality put L(±|7r) = L\ and L(0) = L 2 (with L\ ^ L 2 ). 

(iii) All three order parameters are different: L(— |7r) = L±, L(0) = L 2 , £(§7r) = L 3 , 
with L\ ^ L 2 7^ L3. 

We will show that, as the temperature is lowered, first the type (ii) solution bifurcates 
continuously from the type (i) solution at T/ = | J p , and that the type (iii) solution, in 
turn, bifurcates continuously from type (ii) at a lower temperature T c 77 . 

In order to find the type (ii) solutions, and the critical temperature for which these 
are created as bifurcations away from the uniform one, we introduce Z = L\ — L 2 . Thus 

L{±^n) = L l = l -{2p + Z) 

L(0) =L 2 = l -(2p-2Z) 
Insertion shows that such states indeed solve fllTf), with Z following from 

Z = F(Z;(3J P ) (21) 
l-e~ KZ , ,l-e KZ 



F(Z;Jg) = (l + p) - , _ -(l-p)- 



2 + e- KZ v ^'2 + e KZ 
The trivial solution Z = of (pip brings us back to the uniform state. Bifurcations occur 
when Z = F(Z;j3J p ) and 1 = dzF(Z; f3J p ); continuous bifurcations away from Z = 
occur when 1 = \im z ^odzF(Z; (3J p ) = |/3J p . This gives a second-order transition from 
state (i) to state (ii) at the critical temperature T/ = | J p , i.e. precisely at the point (|Tfi|) 
where the uniform state was found to de-stabilise. Since lim^^ ±00 F(Z;K) = ±| — 
and limz->o d z F(Z;K) = —\K 2 < there is no evidence for first-order transitions. 

Next, in order to analyse the type (iii) solutions and to build in the normalisation 
X^£(0) = 2p, we define Z\ = L% — L 2 and Z 2 = Li — L 3 , such that 

L(^7f) =L l = l -{2p + Z l + Z 2 ) 

L(0) =L 2 = ^(2p-2Z 1 + Z 2 ) 

L(~tt) = L 3 = ^(2 P + Z 1 -2Z 2 ) 



Structure Formation in Random Hetero- Polymers 



11 



This reduces our saddle-point equations ([13]) to two coupled equations for {Z X ,Z 2 }, 
which take the following form: 

Z x = F(Z h Z 2 ;(3J P ) Z 2 = F(Z 2 , Z x tfJ p ) (22) 

F(Z X ,Z 2 ;K) = (l + P ) i + e _ KZi+e _ KZ2 - (l-p) 1 + eKZi + eKZ2 

For {Zi = 0, Z 2 7^ 0} or {Z 2 = 0, Z x ^ 0} we return to a state of type (ii), whereas 
the trivial solution Z x = Z 2 = brings us back to state (i). Bifurcations occur when 
(Z X ,Z 2 ) = F(Zi,Z 2 ;PJ p ) and det[l - (DF)(Z X ,Z 2 )\ = 0, where F : K 2 -> K 2 denotes 
the non-linear mapping (Z x , Z 2 ) — > (F(Z X , Z 2 ;(3J p ), F(Z 2 , Z x ;f3J p )) and DF its Jacobian 
matrix. Thus, when the system is in a type (ii) state, corresponding to e.g. Z x = Z and 
Z 2 = with Z given as the solution of fl2"T|), a continuous bifurcation is signaled by 

1 - (d x F)(Z,0;(3J p ) -(d 2 F)(Z,0;[3J p ) 
-(d 2 F)(0,Z;PJ p ) 1 - (d x F)(0,Z;(3J p ) 

Working out the partial derivatives shows that, since one of the off-diagonal terms 
vanishes, this is equivalent to requiring either 
1 1 + p 1 — p 



det 







or 



(3J P 2 + e-P J p z 2 + eP J r- z 



3(3 J p (2 + e~^ z ) 2 (2 + e^ z ) 2 

The second equation signals a possible destabilisation within the class of type (ii) 
solutions (which can only happen when there are multiple stable type (ii) solutions, for 
which there is no evidence); the first equation describes the creation/ annihilation of type 
(iii) solutions from a type (ii) one. When solved in combination with the saddle-point 
equation (|2T|), this latter equation gives the desired (second-order) (ii)— >(iii) transition 
line T/ . The solution can be represented conveniently in the form of a parametrisation 
in the (fiJ p ,p) plane, with x = j3J p Z e (— oo, oo): 

1 cosh(o;) - 1 + xsinh(3;) 
N {X) ~ 2 cosh(o:) - 1 {26) 

, , x cosh(x) + 2x — 3 sinh(x) , , 

p x = "i — nrV 24 

1 — cosh(x) — xsmh{x) 
Note that lim :c ^ T00 p(x) = ±1 and that (3J{x) ~ \x as x — >• oo. Equations ( p3[ - |2~4] ) for 



the (ii) — > (iii) transition, together with (3J p = 3/2 (^) describing the (i)— >(ii) transition, 
in fact represent all phase transitions in the q = 3 system with polarity energies only. 
This conjecture is based on extensive numerical exploration of the solutions of the fixed- 
point equations (|13|). 

In figure f| we show the resultant phase diagram of the polarity model for q = 3, 
i.e. for (p G | — ^,0,^}, in the (T/J p ,p) plane. It consists of regions characterised 
by the number of different values taken by the three order parameters {L(4>)}, and 
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Figure 4. Phase diagram of the polarity model for q — 3, where <p E { — |7r,0, |7r}. 
Its regions are defined in terms of the number of different values taken by the order 
parameters {L((j))}, and thus by the monomer distributions {P±(4>)}, at the three 
possible orientations: (i) all three L(<f) are identical, (ii) only two of the L(<fi) are 
identical, (iii) all three L(4>) are different. Within our model, these three types of 
phases, which are separated by second order transitions (indicated in the figure by 
solid lines, with a tri-critical point at (T/J p ,p) — (|,0)), can be interpreted as 
representing different degrees of folding. Note that, in contrast to the case q = 2, 
where <\> G {—^tt, ^tt}, here the transitions do depend on the polarity statistics as 
characterized by p. 



therefore by the monomer distributions {P±(<p)} at the three possible orientations. 
All regions are separated by second-order transition lines, viz. flip] ) and (|2"3"|-P^|) . For 
T / J p > | (the high-temperature region) the only possible solution of (|T3|), for any p, is 
L(<j)) = |p for all 0; here the monomers have no preferred orientation (to be interpreted 
as resulting in a swollen state). For T/J p < | the equilibrium solution will depend on 
the value of the polarity statistics parameter p. In region (ii) the monomers exhibit 
some degree of orientation preference (to be interpreted as resulting in a partially folded 
state), whereas in region (iii) one finds a highly orientation specific solution (to be 
interpreted as resulting in a fully folded state). Note that, in view of the fact that also 
for q > 3 the system will in equilibrium allow for at most three different values for 
the order parameters L((p), see (|l||l!|), one must expect the q > 3 phase diagrams to 
be qualitatively similar to the q = 3 one, with only g-dependent re-scaling and weak 
deformations of transition lines. 

In figure [| we show the values of the three order parameters L((p), from which the 
monomer densities P±{4>) follow via (]T4|), as a function of (3J P , for p = 0.2 (left graph) 
and p = 0.4 (right graph). These values are obtained by numerical solution of the 
saddle-point equations (O). We clearly observe the point where type (ii) solutions (two 
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Figure 5. The values taken by the three order parameters £(</>) for the polarity model 
with q = 3, i.e. cj) G { — |7r, 0, |7r}, as a function of (3 J p (i.e. J p /T) and for two different 
values of p. They were obtained by numerical solution of the saddle-point equations 
(fL3|). The graphs shows the two phase transitions (i)— *(ii) and (ii)— »(iii) as continuous 
bifurcations. As predicted, the first transition occurs at (3J P — | (in both graphs), 
whereas the location of the second transition depends on p. 



possible values for the L(<p)) bifurcate from the type (i) solution (all L(<p) are identical), 
at (3J p = 3/2 for both graphs. In contrast, the location of the second bifurcation from 
type (ii) to type (iii) solutions is indeed seen to depend on the parameter p, as predicted. 
We also observe how for (3 — > oo the system approaches the ground state ( |18| , |T9|) , where 
L(<j>) e { P - 1,0,^+1}. 

4. Solution of the Full Model for q = 2 

We will now turn to the full model described by the combination of all three energy 
contributions (|l]-^D. Since we now have a Hamiltonian with both (site) disorder and 
short-range interactions, a simple mean-field approach such as that used in the previous 
section will no longer apply. Here our solution will be based on a suitable adaptation of 
the random- field techniques of |T7], IT3 . We will, for simplicity, consider only the simplest 



non-trivial case q = 2, where 0j = |7rcrj with Oi G { — 1, 1}. Our orientation variables can 
now be replaced by Ising spins, which leads to significant simplifications. For instance, 
the various terms in the Hamiltonian reduce to (upon dropping the irrelevant constants): 

ij 

Hub(o-) = -\jHbY^ 1 - a i a i+l\[ l ~ ViVi-l] (26) 
1 i 

H s (cr) = -J s J2Vi Vi+iVi-i (27) 

i 

with J ub = ^(Jfib + ^Hb)i an d with rji = cosfaj]. Left- and right chirality energies have 
become identical, as expected for 4>i £ k 71 }- The two 'chirality' order parameters 
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(28) 



The joint distribution w[i],^} of the disorder variables {?7i,£j} (which are independent 
for different sites) follows from (|4j): 

0[V, £1 = E ^( A ) - cos[o(A)]] (29) 

A 



^.1. Calculation of the Free Energy 

We note that the polarity energy 
magnetisation' 

1 



777, 



can be written in terms of the 'staggered 



(30) 



in the Mattis [13| form H p (cr) = —^JpNm 2 (cr). We isolate the order parameter m in 
the expression for the free energy per site (0), by inserting 1 = J dm S[m — jrY^i^id]- 
Writing the delta function in integral representation then leads to 



/ = - lim 7^- log / dmdrh e -PNG N (m,m) 



1 



GN(m,rh) = —irarh — -J n m 



■log Z N (-if3m) 



(31) 
(32) 



2 p '" (3N 

where the complicated (short-range) part of the partition sum has now been 
concentrated in the function Zn(x): 

Z N (x) = ^2 e^ jHb ^^ 1 ~ aiai+1 ^ 1 ~ aiai ~ 1 ^ + P Js ^i ai ~ lViai+1+x ^<i cri ^ i (33) 

CT1...(7JV 

(with o"o = o~n+i = 0). The integral in (|3T| ) can for N — > oo be evaluated via steepest 
descent, and will be dominated by the saddle points of the exponent G^(m,m). After 
elimination of m via the saddle-point equation im = —J p m, we can thus write the 
asymptotic free energy per monomer (|32| ) as 



f = extr m < -J v m 2 — lim — — 

J |2 P N^ocfiN 



log Z N {(3J p m) 



(34) 



In order to calculate the partition sum (|3^) we will employ the random-field techniques 
of |T7|, [TJJ]. We condition the function Zn(x) on the values {o~n-i, o~n} of the two spins 
at the end of the chain: 

(T1...CTJV 

with Z N (x) = J2aa'=±i zl N J( x )- The addition of an extra monomer to the chain, i.e. 
N — > N + 1, then leads to the following recurrent relation for the conditioned partition 
functions: 



/ zi N + +1 \x) \ 



zi N+1 \x) 
Z {N + +1 \x) 



(JV+l), 



X 



M N+1 {x)T 



N 



) I 



( Z^(x) \ 
V Z W {x) J 



(36) 
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/ 













































V 













) 



(37) 



V 



-PJsTji+PJHb 









,j3J 3 rn+2l3J Hb 
-f3J a r)i-l3J Hb 



-l3.J s rii-l3J Hb 









-PJsru+PJm, 



(3f 



/ 



As a result we can now write the short-range partition sum Z^(x) (| 
random matrices (|37|j38| ), where the randomness is in the as 

/ 1 \ 
1 



in terms of the 



Z N (x) 



1 

V i J 



N 



nM l+1 (x)T, 



U=3 



( 4 2 1 


(x) 




zf 


(x) 




z^l 


(x) 




U (2) 


(x) 


) 



(39) 



The (random) matrix product will be evaluated in terms of the following (non-negative) 
stochastic quantities, which represent the different ratios of the conditioned partition 
sums fl35[): 



k 



(i) 



Aj) 
'++ 



k 



(2) 



r(j)_ 

aUY 



(3) 



Aj) 



(40) 



From the recurrence relation ([36]) it follows that the variables k^p are, in turn, generated 



by iteration of the following mapping: 

e PJsr,i k f) k f) + e - fV.ru- fUnt 



k 



k (2) 
K j+1 



1.(3) 

S'+i 



3 J 



-(3J L 



p -/3J a rij ui 1 ) hP) , p PJsV3+P J H b 

_ fc i fc i 1 e k^e 2 *^ 

^JsVj+PJHbk^k^ + e-P Jsr i3 j 



pj s rjj+pJ Hbk { 2 ) j^,(3) _|_ e -pJsVj 



3 3 



-PJsVi-PJHbj^pl^p _|_ e y9J«»7j 



We now use 



/3iV 



logZTv(x) 



^togZW(x)+0(i) 



(41) 
(42) 
(43) 

(44) 



and work out the conditioned partition function Z^l2(x) iteratively, via the the 
recurrence relation (|36|): 



iv logZ 



1 



iv logZ -- 



;r 



<7V 

AT 



+ 



A7 l0g { ( 



-PJsr)N-l-tVHbl.( 2 ) i,( 3 ) 



PJsVN 



'} 



(45) 
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Further iteration of this relation gives 



lim — log Z 



(AO, 



N^oo N 

with k = (k^\ k^ 2 \ k^), where p 

1 



dkdr] P(k, rj) log |e" 



-f3J s rj-l3J Hb 



k M k (3) + e PJsv} _ xp ( 46 ) 



/<^7 £M?7' £] ( see equation (|29[) ) , and with 



p ( k , rj) = lim -J- E 6[r) - m]S[k - k t 



(47) 



Provided the stochastic process (f41~l - f43l) is ergodic, the distribution P(k,r]) will be 
identical to the (joint) stationary distribution of the pair {k,r]}, i.e. we may write 
P(k, rj) = liniTv^oo jr Yli(^[v — Vi\d[k ~ Since ki is always statistically independent 
of rji according to (f4l|-|43|) (fc, depends only on those rjj and £j with j < i), we have 
(S[rj - rji]5[k - ki]) = (5[r) - rji\)(8[k - ki}}. Hence P{k,rj) = P 00 (fc|x)tZ;[?7], where 
P QO (fc|a;) is the invariant distribution of the process ( pl"| - pE3| ) (which is parametrised by 
x, due to the occurrence of x in (pE2|)) and where w[rj\ = Z^w[?7, £]. We thereby find 
(fl6p being replaced by 
1 



N 



lim — log Z 



(AO, 



N 



x) 



— xp 



+ 



dk Poo(fc|x) J dr) W[rj\ log { e -PJsV-PJ Hbk (2) k {3) + ^ j ^ 



As a final consequence we can now write the free energy per monomer 



as 



/ = extr r , 
1 



1 T 2 

-J p m 



J p mp 



—- j dk P 00 (fc|/3J p m) 



dr) w [rj] log 

where the invariant measure P OC) (fc|x) of the process 

P oc (k\x) = Jdk' Pooik'lx) JdnY, w[Ti,Z\8[k-F{k!\x,ri,£)\ 



e -(3J sV -l3J Hb k (2) k (3) + e PJsV 

H443|) is to be solved from 



(49) 



(50) 



with 



F 2 (k\x,r],£) 
\^3(k\x,r],^) J 



PJ H b \ 

2xg 



e-PJsvk 1 k 2 +eP Jsr ' +l3J Hb e 
e-> 3J ^k 1 k2+el 3Jar '+l 3J Hb i 
e pJsv+PJHbk 2 k 3 +e-P J sV 3 

e 0J ^ +l3J Hbk2k3+e-l 3J ^ (3J Hb 
e -IJJ s v-IJJHbk 2 k 3 +el 3 Js-n 



(51) 



In the case of the one-dimensional random- field Ising model |14], [19[], for which the 
analysis is very similar, the corresponding distribution Poo(fc) is known, at least in 
certain parameter regions, to become highly non-trivial and acquire the form of the 
derivative of a Devil's Staircase. To our knowledge, no general analytic expression has 
been derived to describe P<x>{k) for finite temperatures. Nevertheless, for the purpose 
of the present paper it is only a simple numerical exercise to evaluate P oc (fc|a;) directly 
by iteration of (|5l|), for values of {rj, £} drawn randomly according to w[t7,£]. 
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4-2. Simple Limiting Cases 



Before converting our general results into phase diagrams we will first carry out 
benchmark tests of our expressions, by inspecting simple limits. 

• Firstly, in the absence of short-range interactions, i.e. for Jnb = Js = 0, the 
expression for the asymptotic free energy per site flilf) should reduce to the q = 2 
version of (|12[) , which ought to be simply the free energy of the infinite-range Mattis 



model flU. Indeed, we find that for J, 



fcifc2+l h. p 2x£ 

k 2 k 3 +l ^e 
1 



l^(k\x, V ,0\ 

F2(jk\x,TJ,£) 

Hence P 00 (fc|x) = 8[ki — l]5[k s 

P 00 (k 2 \x)= l -(l+p)5[k 2 -e 2x }^ 
Substitution into fl49|), for J Hb = J s = 0, gives 



Jm = the mapping 
1 \ 



ID reduces to 



l]Poc(h\%), with 
1 



(l-p)5[k 2 -e 



-2x} 



/ = extr. 




— log 2 cosh (j3J p m) 
P 



(52) 



(53) 



(54) 



which is indeed the well-known asymptotic free energy per site of an infinite-range 
Mattis magnet . 

Secondly, for J Hb = and r\i = £j = 1 for all i (i.e. £] = 5[rj — 1]5[£ — 1] and 
p = 1) the macroscopic laws of our model should reduce to those of the p2|l , which 
describes pattern recall in recurrent neural networks with competition between 
short-range and long-range information processing, for the simplest 'one-pattern' 
scenario. For J Hb = and w[rj, £] = S[rj — 1]5[£ — 1] the mapping ([51]) becomes fully 
deterministic, and takes the form 

/ T (h\rr\ \ ( eP±k 1 k 2 +e-P± \ 
I J-\[K\X) \ I e-PJshkz+ePJs ^ 

3Js kik 2 +e l3Js ^ 3 g2x 



?2{k\x) 
\Fz{k\x) j 



e pJsk2k3+e~l 3J !> 



(55) 



/ 



and Poo(A;|x) = S[k — k*(x)], where k*(x) denotes the fixed-point of the mapping 
( |55D with non-negative components, which (in line with our previous assumption 
of ergodicity of the original process (0-K3)) we assume to be unique. We observe 



that (|55| ) preserves k\ = ks, and the remaining components of the fixed-point 
k*(x) = (ki, k 2 , k*) must obey 




/ e /3Js km+e-' 3Js 
e-pJsklkl+ePJs 



*„2x 



(56) 



This (in turn) gives (k*, k 2 ) 
the non-negative solution of 

e f3(J s +J P m) fo* _|_ e -/3(J a +J p m) 

fe* — 

g— f3[Ja— Jprn)^* _|_ g/3(J s — J p m) 



(k*,e 2x ), where (upon substituting x = (3J p m) k* is 



(57) 
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Insertion into (H9T) gives us 



/ = extr m I -J p m 2 - - log 



-f}(.J a -J p m) ^-k _|_ e f3(J s -J p m) 



(5f 



It follows from (|57|) that the quantity A = e -P( J s- J P m )k* + e /3(J s -J P m.) occurr i n g i n 
fl58|) obeys (A — e /3(j2+Jpm ^)(A — e /3 ( Js ~ J f m )) = e~ 2/3Ja , which we recognise as the 
eigenvalue equation of the transfer matrix 



/3(J a +J p m) 



e -PJs 
f3(J s -J P m) 



(59) 



This shows that the free energy is indeed identical to that of [22 



4-3. Phase Diagrams and Comparison with Numerical Experiments 

In order to obtain phase diagrams we finally have to calculate the local extrema of a 
free energy surface f[m], the argument of the extremisation in (|49|), which still depends 
on the choice made for the statistics of the monomer properties {£,,rj}. Here we apply 
our theory to the simple example w[r), £] = j[S(r] + 1) + 5(r) — 1)][5(£ + 1) + 5(£ — 1)], 
hence also p = 0. In this case the free energy surface f[m] simplifies to 



/[to] 



- J v m 

2 p 2(3 



(60) 



dk P oc (k\(3J p m) 

x log \ (e~ p{Js+jHb) k 2 k 3 + e pJa )(e p{Js - jHh] k 2 k z + e-^) 
where the invariant measure Poo(fc|x) of the process (|4l| - |43|) is to be solved from 

P^klx) = \fdk' P 00 (fc» Yl E 5 l k - Hk'\x, Vl 0] (61) 

4 J T7=±l f =±1 

with the mapping defined in (|5T|). We determine the solution of (BTJ) via numerical 
iteration. Note that, due to w[£] = w[— we have P 00 (fc|x) = Poo(&| — Hence 
/[to] = /[—to], and m = always corresponds to a saddle-point of /[to]. Note also that 
for = (no hydrogen bonding) considerable further simplification of ( |60| , |6lD will be 
possible, due to the resulting conservation of the symmetry k\ = k 3 by the map (|5lD . 

Examples of the results of our analysis of the surface (|60|) are shown in figure [^, as 
phase diagram cross-sections in the (T, J p ) plane, for { J s = 4, Jub = 0} (left picture) 
and { J s = 4, J Hb = |} (right picture). They involve 

(i) a high-temperature phase 'P', where to = is the only local minimum of /[to] and 
no folding will occur, 

(ii) a phase 'F' where two equivalent m ^ solutions minimise /[to] (one positive, 
one negative, reflecting the symmetry of the present model under overall reflection 
4>% — ► 4>i + k), the 'folded state', 

(iii) phases ! M' where four to ^ solutions minimise /[to] locally (two positive, two 
negative). 
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Figure 6. Phase diagrams cross-sections in the (J P ,T) plane, for J s = 4 and Jnb = 
(left graph) and Jab — 2 (right graph), obtained by numerical solution of ([n]) (which 
becomes increasingly complicated as T — > 0). They involve a high-temperature region 
'P' where m = is the only local minimum of /[to], a region 'F' where two equivalent 
to 7^ solutions (one positive, one negative) minimise /[to]. In the low temperature 
region a series of 'mixed' phases 'M' emerge, where multiple states with different 
degrees of folding can be simultaneously locally stable (four values for m give local 
minima). The P— >F transition is second-order. The F— >M transitions are first-order 
(dynamical) transitions. In the presence of hydrogen bonds, the M phases are found 
to be increasingly suppressed (see right picture). 



In the M phases, the degree of folding observed will strongly depend on initial conditions 
(in spite of the fact that the lowest value for f[m], and hence the thermodynamic state, 
corresponds only to the maximally folded state, where \m\ is largest). See also figure ^| 
below. The P— >F transition is an ordinary second-order transition, whereas the F— >M 
transitions are first-order (dynamical) transitions. In the presence of hydrogen bonding, 
the M phases are found to be increasingly suppressed (see right picture). 

In order to illuminate the physical mechanism which produces the 'mixed' phases, 
we plot in figure [7] the entropy per monomer s = —df/dT close to T = 0, for each of 
the local minima of f[m\. It is seen to become non-zero, and to develop a hierarchy of 
sharp peaks as a function of J p (c.f. fI5fl). These peaks correspond to special parameter 
values for which frustration effects become dominant, and for which many energetically 
equivalent states are possible. The largest value of the ground state entropy is obtained 
at the first of these peaks, for J p « 11.2; this corresponds to the location in the phase 
diagram where the first of the 'mixed' phases appears, see figure []. 

The qualitative features of diagrams such as those shown in figure ^| can now be 
understood as follows. For large values of {J P ,T} the short-range forces (steric forces 
and hydrogen-bonds) become irrelevant, and the diagram approaches that of a Mattis 
model (as it should), with a second-order transition along the line T = J p . For low 
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Figure 7. Entropy per monomer s = —df/dT close to zero-temperature (in this 
graph: T = 0.01, J s = 4 and Jnb = 0), as a function of J p , evaluated numerically 
via differentiation of the free energy ([f9j). It is seen to become non-zero and develop 
a hierarchy of sharp peaks at special values of J p , where local frustration is maximal. 
'Mixed' phases in the phase diagram emerge at precisely these locations (see figure 0). 



temperatures the simple Mattis state is disrupted by the steric interactions, which try 
to enforce monomer-specific short-range order along the chain; as a result the value 
needed for J p to create m/fl states is increased (explaining the re-entrance observed in 
figures H The complex phenomenology (reminiscent of random field models) of multiple 
locally stable configurations, induced by the steric interactions, is subsequently found 
to be damped by the hydrogen bonds, which act to reduce the complexity of the ground 
state. 

Next, in figure |8| we plot the equilibrium values of the 'chirality' ( |28|) and 'polarity' 
(50) order parameters as functions of the hydrogen bond strength Jjji,, for three different 
values of J p (in a region of the phase diagram where there are no mixture phases, i.e. 
where apart from overall reflection, the stationary state is unique). Note that x is simply 
calculated as x — /dJm (which is done numerically). The two order parameters 

X and m are seen to show an opposite dependence on J Hb (monotonically increasing 
vs. decreasing), as they should, since, x measures the degree of helical structure along 
the chain, whereas m measures the probability to find monomers with identical polarity 
at the same side of the chain. Due to the competing roles played by two coupling 
parameters { J p , Jm}, we see that 'helices' are favoured for large Jm or small J p whereas 
'folding' in the sense of efficient polarity separation, on the other hand, is favoured for 
small Jub or large J p . Note that the observed incompatibility of helical structure with 
polarity separation is just a reflection of the simple form we chose in this section for the 
disorder distribution w[r), £] (with statistically independent r\ and £); the situation would 
obviously have been different for distributions describing correlated disorder variables. 
In the same figures we also show the results of numerical simulations, for comparison (the 
markers in the two graphs). For small Jm our experiments are seen to be in excellent 
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Figure 8. Equilibrium values of the 'chirality' and 'polarity' order parameters \ 
and m as functions of the hydrogen-bond strength Jub- Lines represent theoretical 
prediction, and markers the simulation results as measured after 120, 000 iterations 
per monomer in a system of N = 1000 monomers. The values for J p were chosen as 
J p 6 {4, 8, 12} (left panel: upper graph to lower graph; right panel: lower graph to 
upper graph). In all cases T = 2 and J s = 2. 



agreement with the theory (finite size effects are of the order of 0(N~ 1 / 2 ) « 0.03) 
whereas for large Jjib short-range couplings become increasingly dominant, leading to 
domain formation and very slow equilibration times, which make it difficult in practice 
to probe the equilibrium regime. In our experiments we have measured the value 
of the order parameters after 120, 000 iterations per spin, which for large Jub is no 
longer sufficient. Note that the theory also predicts the existence of repeated small 
discontinuous in both order parameters; these originate from frustration-related short- 



range phenomena, as described in e.g. which induce discretisation of observable 

supports p0|, |2"I] and non-analytic integrated distribution functions (e.g. the Devil's 



Staircase [PH). 

To verify our results further we have also performed simulation experiments in 
the 'mixed' phase regions, where our theory predicts that the extent of polarity-driven 
folding (i.e. the equilibrium value of m) will depend on initial conditions. In figure 
H we show the value of the 'polarity' order parameter m, as measured in numerical 
simulations of an N = 1000 chain after 20, 000 iterations per monomer, as a function 
of its initial value m(t = 0), for two different parameter settings (one, to the left, in an 
M region of the phase diagram; one, to the right, in an F region of the phase diagram). 
In the insets of these graphs we also plot the corresponding free energy per monomer 
f[m] as predicted by our theory, which shows either two m > locally stable states (left 
picture) or one m > locally stable state (right picture), respectively. In both cases the 
numerical experiments are found to verify the existence and the quantitative properties 
of the expected ergodicity breaking in the M phase. We clearly observe that, in phase 
M, the choice of initial conditions, in particular whether or not m(t = 0) is to the left 
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Figure 9. Order parameter m as measured in numerical simulations of an N — 1000 
chain after 20,000 iterations/monomer (m nn ), versus its initial value mj n i = m(t = 0), 
for system parameters J s = 4, J p = 10, Jjjb = 1 and T = 0.1 (left picture, in the M 
phase) and J s = 2, J p — 4, Jnb = 1 and T — 0.5 (right picture, in the F phase). In 
the 'folding' phase (right), our theory predicts the existence of only one m > ergodic 
component (see free energy per monomer /[m], graph in the inset), at m w 0.67 
(horizontal line). In the 'mixed' phase (left), our theory predicts the existence of two 
m > ergodic components (see free energy per monomer f[m], graph in the inset), at 
m w 0.65 (horizontal line, for m ml < 0.774) and m « 1 (for m^i > 0.774). This is 
confirmed by the numerical simulations (finite size effects are expected to be of order 
Am w ss 0.01). In the m « 0.65 state of the mixed phase (left graph, horizontal 
line), the system is found not yet to be fully equilibrated (signaled by a dependence of 
mfin on mini), due to domain formation. 



of the free energy barrier in f[m], determine the equilibrium value of m. We also see 
that in the 'mixed' phase (left picture) the ergodic component with the smallest value 
of m is poorly equilibrated due to domain formation. This has also been observed for a 
similar type of statistical mechanical model in ||22|| : in those parameter regions where a 
multiple number of states can be locally stable, different ergodic components are found 
to have different equilibration time-scales. 



5. Discussion 



In this paper we have presented an exactly solvable model for secondary structure 
formation in random hetero-polymers, consisting of amino-acid monomers which are 
allowed to interact in three qualitatively different ways: via (short-range) steric 
interactions, via (short-range) hydrogen-bonding, and via (long-range) polarity-induced 
forces. Our strategy was to exploit the one-dimensional nature of the monomer chain, 
and to separate questions relating to secondary structure formation from those relating 
to tertiary structure formation by taking into account the effects of the latter only via an 
effective energy term which measures the potential for overall energy reduction by folding 



Structure Formation in Random Hetero- Polymers 



23 



(rather than trying to find the actual state realising this potential). This allows us to 
move away from real-space calculations towards a calculation in 1 + oo dimensions, 
where the statistical mechanical variables represent the orientations of the monomer 
residues relative to the chain axis. Solution can now be based on a combination of 
mean-field and random transfer-matrix techniques, which in one-dimensional models are 
known to reduce the evaluation of the partition function to a relatively simple numerical 
problem. Due to the presence of long-range interactions (via polarity- induced forces), 
phase transitions are still possible (and do indeed occur) at finite temperatures. 

Our order parameters measure the degree of polarity-induced collapse of the chain, 
as well as the degree of helicity along the chain. The phase diagrams exhibit second- 
order transitions between 'folded' and 'unfolded' states, and, for low temperature and 
sufficiently strong steric interactions, a series of 'mixed' phases (separated from the 
previous ones by discontinuous transitions) where, in addition to the maximally folded 
states, specific partially folded states can also be locally stable. The latter phases 
are created at parameter values for which frustration is maximal, and where the 
entropy becomes particularly large. Although in the present paper we have mostly 
restricted ourselves (for simplicity) to chains with just a small number of possible 
orientations per monomer, it is not fundamentally more difficult to solve the model 
for larger degrees of orientational freedom (although certain adaptations are needed 
before the continuum limit can be taken, such as a re-scaling of the effective long 
range coupling J p and/or of the number of relative monomer orientations where polarity 
interactions occur). We have only evaluated our theory for the simplest choice of disorder 
statistics (the statistical properties of the monomers, and their physical properties 
such as polarity and steric constraints). Here the emerging picture is already quite 
satisfactory, in that explicit analytical results can be obtained, and that the predicted 
physical behaviour of the monomer chain (confirmed qualitatively and quantitatively by 
numerical simulations) makes perfect sense in the context of proteins: the polarity forces 
drive the transition to a collapsed state, the steric forces introduce monomer specificity, 
and the hydrogen bonds stabilise the conformation by damping the frustration-induced 
multiplicity of states. 

There is still much scope for increasing the biological realism and relevance of our 
model without affecting its analytical solvability, at different levels. Firstly, without 
changing the model or its techniques for solution, one can easily consider more realistic 
choices for the monomer statistics, such us non-binary polarity variables, or for the 
orientational freedom of the monomers (for instance, the hydrogen-bond term may 
be modified to favour helix-type formations at the biologically observed ratio of 3.6 
monomers per turn). Secondly, at a next level of sophistication one could construct a 
more realistic form for the polarity induced energy contribution (breaking the present 
hydrophobic-hydrophilic symmetry, and based upon biological data), or more realistic 
representations of the degrees of freedom of the individual peptide units and residues 
(i.e. three angles per monomer, rather than one), or the action of 'chaperones' (via 
external fields). Solution of such models would not be essentially more difficult than 
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that of the examples worked out here; the main problem would rather be to extract 
the canonical definitions of the ingredients to be incorporated into the model from the 
available biological data. In contrast, qualitatively different and more difficult types of 
modification and extension would be to consider non-random hetero-polymers, where 
the monomer properties and statistics are chosen such as to mimic real proteins, or to 
try to analyse the interplay between secondary and tertiary structure formation. Here 
new techniques for solution will have to come in. 

The main problem in the statistical mechanical study of folding proteins appears 
to be the construction of models where an acceptable and productive balance can be 
found between analytical solvability and biological realism. We believe that our present 
model might point to a new direction where this might be achieved. 
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